Renton
Hierarchical Federated Unlearning for Large Language Models
Zhong, Yisheng, Yang, Zhengbang, Zhu, Zhuangdi
Large Language Models (LLMs) are increasingly integrated into real-world applications, raising concerns about privacy, security and the need to remove undesirable knowledge. Machine Unlearning has emerged as a promising solution, yet faces two key challenges: (1) practical unlearning needs are often continuous and heterogeneous, and (2) they involve decentralized, sensitive data with asymmetric access. These factors result in inter-domain and intra-domain interference, which further amplifies the dilemma of unbalanced forgetting and retaining performance. In response, we propose a federated unlearning approach for LLMs that is scalable and privacy preserving. Our method decouples unlearning and retention via task-specific adapter learning and employs a hierarchical merging strategy to mitigate conflicting objectives and enables robust, adaptable unlearning updates. Comprehensive experiments on benchmarks of WMDP, MUSE, and TOFU showed that our approach effectively handles heterogeneous unlearning requests while maintaining strong LLM utility compared with baseline methods.
- North America > Canada > Ontario > Toronto (0.05)
- North America > United States > Virginia > Fairfax County > Fairfax (0.04)
- North America > United States > Washington > King County > Renton (0.04)
- (2 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
Robust LLM Training Infrastructure at ByteDance
Wan, Borui, Liu, Gaohong, Song, Zuquan, Wang, Jun, Zhang, Yun, Sheng, Guangming, Wang, Shuguang, Wei, Houmin, Wang, Chenyuan, Lou, Weiqiang, Yang, Xi, Zhang, Mofan, Jiang, Kaihua, Ren, Cheng, Zhi, Xiaoyun, Yu, Menghan, Nan, Zhe, Zheng, Zhuolin, Zhong, Baoquan, Wang, Qinlong, Yu, Huan, Chi, Jinxin, Zhang, Wang, Li, Yuhan, Du, Zixian, Zhao, Sida, Zhang, Yongqiang, Tang, Jingzhe, Liu, Zherui, Wu, Chuan, Peng, Yanghua, Lin, Haibin, Xiao, Wencong, Liu, Xin, Xiang, Liang
The training scale of large language models (LLMs) has reached tens of thousands of GPUs and is still continuously expanding, enabling faster learning of larger models. Accompanying the expansion of the resource scale is the prevalence of failures (CUDA error, NaN values, job hang, etc.), which poses significant challenges to training stability. Any large-scale LLM training infrastructure should strive for minimal training interruption, efficient fault diagnosis, and effective failure tolerance to enable highly efficient continuous training. This paper presents ByteRobust, a large-scale GPU infrastructure management system tailored for robust and stable training of LLMs. It exploits the uniqueness of LLM training process and gives top priorities to detecting and recovering failures in a routine manner. Leveraging parallelisms and characteristics of LLM training, ByteRobust enables high-capacity fault tolerance, prompt fault demarcation, and localization with an effective data-driven approach, comprehensively ensuring continuous and efficient training of LLM tasks. ByteRobust is deployed on a production GPU platform and achieves 97% ETTR for a three-month training job on 9,600 GPUs.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- (15 more...)
- Information Technology (0.47)
- Energy (0.46)
Laminar: A Scalable Asynchronous RL Post-Training Framework
Sheng, Guangming, Tong, Yuxuan, Wan, Borui, Zhang, Wang, Jia, Chaobo, Wu, Xibin, Wu, Yuqi, Li, Xiang, Zhang, Chi, Peng, Yanghua, Lin, Haibin, Liu, Xin, Wu, Chuan
Reinforcement learning (RL) post-training for Large Language Models (LLMs) is now scaling to large clusters and running for extended durations to enhance model reasoning performance. However, the scalability of existing RL frameworks is limited, as extreme long-tail skewness in RL trajectory generation causes severe GPU underutilization. Current asynchronous RL systems attempt to mitigate this, but they rely on global weight synchronization between the actor and all rollouts, which creates a rigid model update schedule. This global synchronization is ill-suited for the highly skewed and evolving distribution of trajectory generation latency in RL training, crippling training efficiency. Our key insight is that efficient scaling requires breaking this lockstep through trajectory-level asynchrony, which generates and consumes each trajectory independently. We propose Laminar, a scalable and robust RL post-training system built on a fully decoupled architecture. First, we replace global updates with a tier of relay workers acting as a distributed parameter service. This enables asynchronous and fine-grained weight synchronization, allowing rollouts to pull the latest weight anytime without stalling the actor's training loop. Second, a dynamic repack mechanism consolidates long-tail trajectories onto a few dedicated rollouts, maximizing generation throughput. The fully decoupled design also isolates failures, ensuring robustness for long-running jobs. Our evaluation on a 1024-GPU cluster shows that Laminar achieves up to 5.48$\times$ training throughput speedup over state-of-the-art systems, while reducing model convergence time.
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Man-Made Heuristics Are Dead. Long Live Code Generators!
Dwivedula, Rohit, Saxena, Divyanshu, Akella, Aditya, Chaudhuri, Swarat, Kim, Daehyeok
Policy design for various systems controllers has conventionally been a manual process, with domain experts carefully tailoring heuristics for the specific instance in which the policy will be deployed. In this paper, we re-imagine policy design via a novel automated search technique fueled by recent advances in generative models, specifically Large Language Model (LLM)-driven code generation. We outline the design and implementation of PolicySmith, a framework that applies LLMs to synthesize instance-optimal heuristics. We apply PolicySmith to two long-standing systems policies - web caching and congestion control, highlighting the opportunities unraveled by this LLM-driven heuristic search. For caching, PolicySmith discovers heuristics that outperform established baselines on standard open-source traces. For congestion control, we show that PolicySmith can generate safe policies that integrate directly into the Linux kernel.
- North America > United States > Texas > Travis County > Austin (0.28)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.07)
- (16 more...)
Percepta: High Performance Stream Processing at the Edge
Sousa, Clarisse, Fonseca, Tiago, Ferreira, Luis Lino, Venâncio, Ricardo, Severino, Ricardo
Clarisse Sousa, Tiago Fonseca, Luis Lino Ferreira, Ricardo Venâncio, Ricardo Severino INESC - TEC/ Instituto Superior de Engenharia do Porto Porto, Portugal {cassa, calof, llf, ravrf, sev } @isep .ipp.pt Abstract -- The rise of real - time data and the proliferation of Internet of Things (IoT) devices have highlighted the limitations of cloud - centric solutions, particularly regarding latency, bandwidth, and privacy. These challenges have driven the growth of Edge Computing. Associated with IoT appears a set of other problems, like: d ata rate harmonization between multiple sources, protocol conversion, handling the loss of data and the integration with Artificial Intelligence ( AI) models . This paper presents Percepta, a lightweight D ata S tream P rocess ing (DSP) system tailored to support AI workloads at the edge, with a particular focus on such as Reinforcement Lear ning (RL). It introduces specialized features such as reward function computation, data storage for model retraining, and real - time data preparation to support continuous decision - making. Additional functionalities include data normalization, harmonization across hetero geneous protocols and sampling rates, and robust handling of missing or incomplete data, making it well - suited for the challenges of edge - based AI deployment .
- Europe > Portugal > Porto > Porto (0.24)
- North America > United States > Washington > King County > Renton (0.04)
- Europe > Switzerland (0.04)
- (5 more...)
- Energy (1.00)
- Transportation > Ground > Road (0.47)
- Transportation > Electric Vehicle (0.47)
- Information Technology > Smart Houses & Appliances (0.35)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Architecture > Real Time Systems (1.00)
A Case for Declarative LLM-friendly Interfaces for Improved Efficiency of Computer-Use Agents
Wang, Yuan, Li, Mingyu, Chen, Haibo
Computer-use agents (CUAs) powered by large language models (LLMs) have emerged as a promising approach to automating computer tasks, yet they struggle with graphical user interfaces (GUIs). GUIs, designed for humans, force LLMs to decompose high-level goals into lengthy, error-prone sequences of fine-grained actions, resulting in low success rates and an excessive number of LLM calls. We propose Goal-Oriented Interface (GOI), a novel abstraction that transforms existing GUIs into three declarative primitives: access, state, and observation, which are better suited for LLMs. Our key idea is policy-mechanism separation: LLMs focus on high-level semantic planning (policy) while GOI handles low-level navigation and interaction (mechanism). GOI does not require modifying the application source code or relying on application programming interfaces (APIs). We evaluate GOI with Microsoft Office Suite (Word, PowerPoint, Excel) on Windows. Compared to a leading GUI-based agent baseline, GOI improves task success rates by 67% and reduces interaction steps by 43.5%. Notably, GOI completes over 61% of successful tasks with a single LLM call.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (13 more...)
FlexiQ: Adaptive Mixed-Precision Quantization for Latency/Accuracy Trade-Offs in Deep Neural Networks
Kim, Jaemin, Um, Hongjun, Kim, Sungkyun, Park, Yongjun, Seo, Jiwon
Neural networks commonly execute on hardware accelerators such as NPUs and GPUs for their size and computation overhead. These accelerators are costly and it is hard to scale their resources to handle real-time workload fluctuations. We present FlexiQ, an adaptive mixed-precision quantization scheme for computer vision models. FlexiQ selectively applies low-bitwidth computation to feature channels with small value ranges and employs an efficient bit-lowering method to minimize quantization errors while maintaining inference accuracy. Furthermore, FlexiQ adjusts its low-bitwidth channel ratio in real time, enabling quantized models to effectively manage fluctuating inference workload. We implemented FlexiQ prototype, including the mixed-precision inference runtime on our custom NPU and GPUs. Evaluated on eleven convolution- and transformer-based vision models, FlexiQ achieves on average 6.6% higher accuracy for 4-bit models with finetuning and outperforms four state-of-the-art quantization techniques. Moreover, our mixed-precision models achieved an efficient accuracy-latency trade-off, with the 50% 4-bit model incurring only 0.6% accuracy loss while achieving 40% of the speedup of the 100% 4-bit model over 8-bit model. Latency evaluations on our NPU and GPUs confirmed that FlexiQ introduces minimal runtime overhead, demonstrating its hardware efficiency and overall performance benefits.
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.05)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Washington > King County > Renton (0.04)
- (6 more...)
An LLM-based Agentic Framework for Accessible Network Control
Lin, Samuel, Zhou, Jiawei, Yu, Minlan
Traditional approaches to network management have been accessible only to a handful of highly-trained network operators with significant expert knowledge. This creates barriers for lay users to easily manage their networks without resorting to experts. With recent development of powerful large language models (LLMs) for language comprehension, we design a system to make network management accessible to a broader audience of non-experts by allowing users to converse with networks in natural language. To effectively leverage advancements in LLMs, we propose an agentic framework that uses an intermediate representation to streamline configuration across diverse vendor equipment, retrieves the network state from memory in real-time, and provides an interface for external feedback. We also conduct pilot studies to collect real user data of natural language utterances for network control, and present a visualization interface to facilitate dialogue-driven user interaction and enable large-scale data collection for future development. Preliminary experiments validate the effectiveness of our proposed system components with LLM integration on both synthetic and real user utterances. Through our data collection and visualization efforts, we pave the way for more effective use of LLMs and democratize network control for everyday users.
- North America > United States > New York > New York County > New York City (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Brazil (0.04)
- (6 more...)
- Information Technology > Networks (0.69)
- Information Technology > Security & Privacy (0.66)
- Telecommunications > Networks (0.51)
Boosting Skeleton-Driven SMT Solver Fuzzing by Leveraging LLM to Produce Formula Generators
Sun, Maolin, Yang, Yibiao, Zhou, Yuming
Satisfiability Modulo Theory (SMT) solvers are foundational to modern systems and programming languages research, providing the foundation for tasks like symbolic execution and automated verification. Because these solvers sit on the critical path, their correctness is essential, and high-quality test formulas are key to uncovering bugs. However, while prior testing techniques performed well on earlier solver versions, they struggle to keep pace with rapidly evolving features. Recent approaches based on Large Language Models (LLMs) show promise in exploring advanced solver capabilities, but two obstacles remain: nearly half of the generated formulas are syntactically invalid, and iterative interactions with the LLMs introduce substantial computational overhead. In this study, we present Chimera, a novel LLM-assisted fuzzing framework that addresses both issues by shifting from direct formula generation to the synthesis of reusable term (i.e., logical expression) generators. Particularly, Chimera uses LLMs to (1) automatically extract context-free grammars (CFGs) for SMT theories, including solver-specific extensions, from documentation, and (2) synthesize composable Boolean term generators that adhere to these grammars. During fuzzing, Chimera populates structural skeletons derived from existing formulas with the terms iteratively produced by the LLM-synthesized generators. This design ensures syntactic validity while promoting semantic diversity. Notably, Chimera requires only one-time LLM interaction investment, dramatically reducing runtime cost. We evaluated Chimera on two leading SMT solvers: Z3 and cvc5. Our experiments show that Chimera has identified 43 confirmed bugs, 40 of which have already been fixed by developers.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.05)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (22 more...)
Active Learning and Transfer Learning for Anomaly Detection in Time-Series Data
Kelleher, John D., Nicholson, Matthew, Agrahari, Rahul, Conran, Clare
This paper examines the effectiveness of combining active learning and transfer learning for anomaly detection in cross-domain time-series data. Our results indicate that there is an interaction between clustering and active learning and in general the best performance is achieved using a single cluster (in other words when clustering is not applied). Also, we find that adding new samples to the training set using active learning does improve model performance but that in general, the rate of improvement is slower than the results reported in the literature suggest. We attribute this difference to an improved experimental design where distinct data samples are used for the sampling and testing pools. Finally, we assess the ceiling performance of transfer learning in combination with active learning across several datasets and find that performance does initially improve but eventually begins to tail off as more target points are selected for inclusion in training. This tail-off in performance may indicate that the active learning process is doing a good job of sequencing data points for selection, pushing the less useful points towards the end of the selection process and that this tail-off occurs when these less useful points are eventually added. Taken together our results indicate that active learning is effective but that the improvement in model performance follows a linear flat function concerning the number of points selected and labelled.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Washington > King County > Renton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)